Visualizing ML Search Results

Code: By Leo Rizk | Notebook: Peter Ma

This notebook is a step by step guide in visualizing the outputs from the ML SETI search and how to interact with the outputted datasets. Any concerns regarding the code feel free to reach out.

In [1]:
# Import libs and utils 
import os
from typing import List, Tuple
import numpy as np
import matplotlib.pylab as plt
import blimpy as bl
from astropy.time import Time
import qrcode
import moviepy.video.io.ImageSequenceClip

from utils import plot_waterfall, make_waterfall_plots, get_cadence_list, get_cadence_h5files, is_in_rfi_bin, get_hits
from utils import get_num_hits_per_cadence, plot_highlighted_hist, scatter_plot_all_hits, hist_all_hits
from utils import save_visualizer, plot_everything, plot_from_ID, get_rfi_bins, generate_and_save_visualizers
from utils import generate_movie, generate_and_save_visualizers_and_movie
from utils import get_num_hits_above_threshold, get_filtered_num_hits_above_threshold
from utils import Cadence 
from utils import Hit 
%matplotlib inline

Filtering Stages

Currently there exists three main secondary filtering stages.

  1. Cut frequencies that are known to contain instrumental errors. $1.1$GHz and lower, between $1.2-1.34$GHz and above $1.9$GHz in L band
  2. Remove all hits below a confidence threshold of $99\%$.
  3. Remove all frequency bins where the hit occurance is above a certain threshold RFI_FILTER=20,000constant ## Constants First we need to set some constants that will help either filter down the search, or mandatory constants for the size of the windows to show and the telescope name etc..
In [2]:
# CONSTANTS

MAX_IMSHOW_POINTS = (8192, 4096 / 8)
TELESCOPE = 'GBT'
OBS_TIME = '300s x 3'
COLUMN = 52

BAD_FREQS = [[0, 1100], [1200, 1340], [1900, 2000]]

grid = plt.GridSpec(12, 8, wspace=1.2, hspace=0.1)

Special Constant

In [3]:
RFI_FILTER = 20000

Single Plot From Single Target

We can plot a specific hit found in one specific target list given some confidence threshold we wish to look above. Note that the j var stands for the number of the hit in the list. And that i is the ith index of the cadence list/ the directories

In [4]:
results = '../GBT_pipeline/result_BLPC0'
In [5]:
all_cadences = get_cadence_list(results)
num_hits_list = get_num_hits_per_cadence(all_cadences, BAD_FREQS, results)
In [6]:
i = 115 # cadence 
j = 3914 # hit within cadence

cadence = Cadence(all_cadences[i], results, BAD_FREQS)
hit = Hit(cadence, j)

plot_everything(hit, all_cadences, num_hits_list, grid)

Ploting and Saving All Hits From Specific Cadence

We can also plot all the hits above a certain threshold of a specific cadence using the following set of code. Note that t is the threshold value.

In [7]:
# Saving png images for hits at or above a chosen confidence threshold
all_cadences = get_cadence_list(results)
# Picking a specific cadence 
sample_cadences = all_cadences[52]

t = 0.999


cadence = Cadence(sample_cadences, results, BAD_FREQS)

for j in range(cadence.num_hits):
    if cadence.hits[j][2] >= t:

        hit = Hit(cadence, j)

        plot_everything(hit, sample_cadences, num_hits_list, output_dirname='visualizers_clean_test_3', 
                        save=True)
        plt.close()

Statistics of Entire Search

We can also look at the distribution of all the hits across the board. First we can look at the total number of hits across all the searched cadences.

In [8]:
# Observing the number of hits per cadence:

tab = ' ' * (14 - len('Cadence index'))
print('Cadence index', tab, 'Number of hits')

for i, num in enumerate(num_hits_list):
    tab = ' ' * (14 - len(str(i)))
    print(i, tab, num)
Cadence index   Number of hits
0               1229
1               3894
2               5772
3               2437
4               1376
5               1507
6               3082
7               1074
8               3223
9               1098
10              3358
11              1416
12              6521
13              2271
14              2744
15              1010
16              2247
17              1890
18              7014
19              2834
20              4205
21              786
22              4414
23              722
24              3365
25              1485
26              541
27              881
28              1030
29              1808
30              2435
31              925
32              1529
33              916
34              2853
35              1189
36              2590
37              3343
38              1415
39              1466
40              4746
41              8484
42              2315
43              1639
44              2800
45              3683
46              2282
47              5663
48              7571
49              1341
50              6841
51              4839
52              1861
53              3121
54              4694
55              4111
56              1781
57              5449
58              1042
59              2132
60              1325
61              4104
62              1839
63              3844
64              2031
65              1894
66              2937
67              4206
68              1534
69              2536
70              2392
71              2346
72              1838
73              6325
74              4466
75              1777
76              6684
77              1677
78              2844
79              2485
80              2805
81              2049
82              5513
83              1227
84              2326
85              1001
86              1650
87              2465
88              3054
89              1105
90              5877
91              910
92              2256
93              1550
94              1427
95              4900
96              1856
97              1612
98              3274
99              1265
100             5530
101             4398
102             3298
103             1589
104             4767
105             3864
106             3969
107             3656
108             1007
109             1158
110             1294
111             6311
112             2297
113             3316
114             2282
115             3929
116             1755
117             1444
118             1744
119             865
120             1369
121             3127
122             6913
123             2202
124             3241
125             1638
126             2912
127             3867
128             4083
129             6144
130             1319
131             2724
132             7374
133             3228
134             1397
135             4005
136             2026
137             2225
138             1918
139             2195
140             1666
141             1090
142             3461
143             1476
144             4365
145             1512
146             4777
147             1679
148             3261
149             3685
150             727
151             4087
152             1023
153             1125
154             2781
155             2661
156             1034
157             914
158             4167
159             1612
160             1192
161             2759
162             2685
163             2179
164             4266
165             5048
166             3130
167             3548
168             2156
169             5996
170             1576
171             5133
172             1521
173             1351
174             2550
175             2587
176             3002
177             10273
178             1431
179             2642
180             4657
181             5151
182             2231
183             5369
184             4149
185             1514
186             3616
187             2152
188             1174
189             1448
190             3827
191             948
192             1566
193             1164
194             3108
195             1702
196             2326
197             4038
198             1897
199             6376
200             4911
201             6047
202             16055
203             1868
204             4795
205             2269
206             4750
207             3040
208             3220
209             3562
210             3512
211             2126
212             22133
213             1423
214             5229
215             2653
216             1808
217             2354
218             4507
219             1198
220             2450
221             5944
222             3652
223             1253
224             3356
225             3132
226             1839
227             2427
228             2931
229             3029
230             1328
231             2371
232             1677
233             2198
234             2320
235             2465
236             1278
237             4806
238             1915
239             2831
240             4270
241             2412
242             3483
243             3082
244             3167
245             2434
246             2270
247             1937
248             2420
249             3298
250             4115
In [9]:
# Finding the maximum hits for one cadence:

print('Maximum number of hits for a single cadence:', max(num_hits_list))
Maximum number of hits for a single cadence: 22133
In [10]:
# Finding the total number of hits for across all cadences:

print('Total number of hits across all cadences:', np.sum(num_hits_list))
Total number of hits across all cadences: 753967

We can count the number of hits above a certain threshold with the t variable

In [11]:
# Investigating the number of hits per cadence at or above a confidence threshold:

THRESHOLD = 0.999999999

tab = ' ' * (16 - len('Cadence index'))
print('Cadence index', tab, 'Number of {}% hits'.format(THRESHOLD * 100))

filtered_num_hits_list = []

for i in range(len(all_cadences)):
    
    cadence = Cadence(all_cadences[i], results, BAD_FREQS)
    
    k = 0
    
    for j in range(cadence.num_hits):
        if cadence.hits[j][2] >= THRESHOLD:
            k += 1
            
    filtered_num_hits_list.append(k)
    
    tab = ' ' * (16 - len(str(i)))
    print(i, tab, k)
Cadence index     Number of 99.9999999% hits
0                 0
1                 0
2                 0
3                 0
4                 0
5                 1
6                 0
7                 0
8                 0
9                 0
10                0
11                0
12                0
13                0
14                0
15                0
16                0
17                0
18                0
19                0
20                0
21                0
22                0
23                0
24                1
25                0
26                0
27                0
28                0
29                1
30                0
31                0
32                0
33                11
34                0
35                0
36                0
37                0
38                0
39                0
40                0
41                1
42                0
43                0
44                0
45                1
46                0
47                1
48                0
49                0
50                0
51                1
52                7
53                0
54                2
55                0
56                0
57                30
58                0
59                0
60                0
61                0
62                0
63                23
64                0
65                0
66                3
67                10
68                0
69                0
70                0
71                0
72                0
73                0
74                0
75                0
76                0
77                0
78                0
79                0
80                0
81                0
82                0
83                0
84                0
85                0
86                0
87                0
88                0
89                0
90                0
91                0
92                0
93                0
94                0
95                0
96                0
97                0
98                0
99                1
100               0
101               16
102               0
103               0
104               3
105               4
106               1
107               0
108               0
109               0
110               0
111               131
112               1
113               0
114               0
115               5
116               0
117               0
118               0
119               0
120               0
121               0
122               0
123               2
124               0
125               0
126               0
127               0
128               0
129               0
130               0
131               0
132               0
133               0
134               0
135               0
136               0
137               0
138               0
139               0
140               0
141               0
142               0
143               0
144               0
145               0
146               1
147               1
148               0
149               0
150               0
151               0
152               0
153               0
154               0
155               0
156               0
157               0
158               0
159               0
160               0
161               1
162               2
163               1
164               3
165               0
166               20
167               0
168               0
169               0
170               0
171               8
172               0
173               0
174               0
175               0
176               0
177               0
178               0
179               0
180               1
181               0
182               0
183               31
184               0
185               0
186               0
187               0
188               0
189               0
190               0
191               0
192               3
193               0
194               0
195               0
196               0
197               0
198               0
199               1
200               0
201               0
202               1
203               0
204               0
205               0
206               3
207               0
208               0
209               0
210               0
211               1
212               0
213               0
214               0
215               0
216               0
217               0
218               0
219               0
220               1
221               0
222               0
223               0
224               0
225               0
226               1
227               0
228               0
229               1
230               0
231               0
232               0
233               0
234               0
235               0
236               4
237               0
238               0
239               0
240               0
241               0
242               0
243               0
244               0
245               0
246               1
247               0
248               0
249               0
250               0
In [12]:
# Finding the maximum hits at or above the threshold for one cadence:
print('Maximum number of {}% hits for a single cadence:'.format(THRESHOLD * 100), max(filtered_num_hits_list))
Maximum number of 99.9999999% hits for a single cadence: 131
In [13]:
# Finding the total number of hits for across all cadences:
print('Total number of {}% hits across all cadences:'.format(THRESHOLD * 100), np.sum(filtered_num_hits_list))
Total number of 99.9999999% hits across all cadences: 343

Histogram of all Hits

We can then compute and plot the histograms with the proper cuts in frequency.

In [14]:
plt.subplot(2, 1, 1)
scatter_plot_all_hits(all_cadences, results, BAD_FREQS)

plt.subplot(2, 1, 2)
hist_all_hits(all_cadences, results, BAD_FREQS)

plt.gcf().set_size_inches(15, 11)    
plt.show()
In [15]:
results_clean = results
new_cadence_list = get_cadence_list(results_clean)
new_num_hits_list = get_num_hits_per_cadence(new_cadence_list, BAD_FREQS, results_clean)
intervals, abundance = hist_all_hits(new_cadence_list, results_clean, BAD_FREQS)

plt.axhline(y=RFI_FILTER, ls='--', lw=2, color='black', label='Flitering cutoff')
plt.legend()
plt.gcf().set_size_inches(15, 5)    
plt.show()

rfi_intervals = get_rfi_bins(intervals, abundance)

Threshold Visualizations

Having gotten all the plots regarding the search statistics, we can then look at the choosing what threshold we'd want to select. This will help limit the total number of candidates we'd need to search by eye.

In [16]:
thresholds = np.arange(0.95, 1.001, 0.001)

num_list = []
for threshold in thresholds:
    num = get_num_hits_above_threshold(new_cadence_list, results_clean, threshold, BAD_FREQS)
    num_list.append(num)
In [17]:
filtered_num_list = []
for threshold in thresholds:
    num = get_filtered_num_hits_above_threshold(new_cadence_list, results_clean, threshold, rfi_intervals, BAD_FREQS)
    filtered_num_list.append(num)
In [18]:
plt.rc('font', size=12)

plt.plot(thresholds * 100, num_list, 'o', color='red')

plt.grid()
plt.xlabel('Confidence rate threshold (%)')
plt.ylabel('Number of hits at or above threshold')
plt.title('Total number of hits obtained with respect to confidence rate threshold')

plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.ylim(min(num_list), max(num_list))

plt.xticks(thresholds * 100, size=8.5, rotation=45)
plt.yticks(num_list, size=8.5)

# Recreate axes on top and on right for easier reading
plt.twiny()
plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.xticks(thresholds * 100, size=8.5, rotation=45)

plt.twinx()
plt.ylim(min(num_list), max(num_list))
plt.yticks(num_list, size=8.5)

plt.gcf().set_size_inches(15.5, 17)
plt.show()

3rd Stage Filtering

After applying a threshold filtering, and then applying Frequency cuts, we further remove frequencies where lots of hits are found. This new threshold is set by the constant RFI_THRESHOLD value. This is shown by the green dotted plots.

In [19]:
plt.rc('font', size=12)

plt.plot(thresholds * 100, num_list, 'o', color='red', label='Unfiltered')
plt.plot(thresholds * 100, filtered_num_list, 'o', color='limegreen', label='Filtered')

plt.grid()
plt.legend()
plt.xlabel('Confidence rate threshold (%)')
plt.ylabel('Number of hits at or above threshold')
plt.title('Total number of hits obtained with respect to confidence rate threshold')

ylim = round(max(num_list) + min(filtered_num_list) + 50, -2)

plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.ylim(0, ylim)

plt.xticks(thresholds * 100, size=8.5, rotation=45)
plt.yticks(np.linspace(0, ylim, 51), size=8.5)

# Recreate axes on top and on right for easier reading
plt.twiny()
plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.xticks(thresholds * 100, size=8.5, rotation=45)

plt.twinx()
plt.ylim(0, ylim)
plt.yticks(np.linspace(0, ylim, 51), size=8.5)

plt.gcf().set_size_inches(15.5, 17)
plt.show()

Creating the Final Product

The end goal of these visualizations is to make an informed decision on how to select these filtering parameters when developing the animation product that will be reviewed by the human eye. This will help characterize the size of the human trial and such. To produce the final product we run the following script.

To visualize and save ALL the plots from all the candidates searched above some threshold we have the following code. Once again there are 2 major modes to consider. Filtering on verus Filtering Off for this example we will have filtering OFF for now. In production we'd have it on.

In [ ]:
generate_and_save_visualizers(dirname=results, threshold=0.999999, output_dirname='visualizers_clean_test',BAD_FREQS=BAD_FREQS)

Final Animation

We can then render all these plots into a handy animation which we can then watch and record the results our selves! Note, this cell requires you to have ran the previous cell.

In [ ]:
generate_movie(img_dir='visualizers_clean_test', video_name='movie_clean_test', fps=1)